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Abstract. Making recommendations in the presence of sparsity is known to present one of the most challenging problems faced 
by collaborative filtering methods. In this work we tackle this problem by exploiting the innately hierarchical structure of the 
item space following an approach inspired by the theory of Decomposability. We view the itemspace as a Nearly Decomposable 
system and we define blocks of closely related elements and corresponding indirect proximity components. We study the theo¬ 
retical properties of the decomposition and we derive sufficient conditions that guarantee full item space coverage even in cold- 
start recommendation scenarios. A comprehensive set of experiments on the MovieLens and the Yahoo!R2Music datasets, using 
several widely applied performance metrics, support our model’s theoretically predicted properties and verify that NCDREC 
outperforms several state-of-the-art algorithms, in terms of recommendation accuracy, diversity and sparseness insensitivity. 
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1. Introduction 


Recommender Systems (RS) are information filter¬ 
ing tools that have been widely adopted over the past 
decade, by the majority of e-commerce sites, in or¬ 
der to make intelligent personalized product sugges¬ 
tions to their customers | llll7i27 ]. RS technology en¬ 
hances user experience and it is known to increase user 
fidelity to the system Correspondingly, from an 
economic perspective, the utilization of recommender 
systems is known to assist in building bigger, and more 
loyal customer bases, and to drive a significant in¬ 
crease in the volume of product sales lEEB. 

The development of recommender systems is - in a 
very fundamental sense - based on a rather simple ob¬ 
servation: people, very often rely their every day deci¬ 
sion making on advise and suggestions provided by the 
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community. For example, it is very common when one 
wants to pick a new movie to watch, to take into con¬ 
sideration published reviews about the movie or ask 
friends for their opinion. Mimicking this behavior, rec¬ 
ommender systems exploit the plethora of information 
produced by the interactions of a large community of 
users, and try to deliver personalized suggestions that 
aim to help an active user cope with the devastating 
number of options in front of him. 

Among the several different approaches to building 
recommender systems, Collaborative Filtering (CF) is 
wi dely regar ded as one of the most successful ones (J, 
^27U38ll40t] . CF methods basically model both users 
and items as sets of ratings, and focus on the sparse rat¬ 
ing matrix that lies at the common core, trying to either 
estimate the missing values, or find promising cells to 
propose (see Figure [T]i. In the majority of CF related 
work for reasons of mathematical convenience (as well 
as fitness with formal optimization methods), the rec¬ 
ommendation task reduces to predicting the ratings for 
all the unseen user-item pairs (prediction-based meth- 









Fig. 1. Example Recommender System 


ods 1 1^26146 1). Recently, however, many leading re¬ 
searchers have turned significant attention to ranking- 
based methods which are believed to conform more 
naturally with how the recommend er syste m will actu¬ 
ally be used in practice Phil 1 ill 15lllhill8ll29ll35ll48l] . 

Despite their success in many application settings, 
RS techniques suffer a number of problems that re¬ 
main to be resolved. One of the most important such 
problems arises from the fact that often available data 
are insufficient for identifying similar elements and is 
commonly referred to as the Sparsity Problem. Spar¬ 
sity imposes serious limitations to the quality of rec¬ 
ommendations, and it is known to decrease signifi¬ 
cantly the diversity and the effectiveness of CF meth¬ 
ods - especially in recommending unpopular items 
(“long tail” problem). Unfortunately, sparsity is an in¬ 
trinsic characteristic of recommender systems because 
in the majority of realistic applications, users typi¬ 
cally interact with only a small portion of the available 
items, and the problem is aggravated even more, by the 
fact that new users with no ratings at all, are regularly 
added to the system (Cold-Start problem 1^36 1). 

Among the most promising approaches in dealing 
with limited coverage and sparsity are graph-based 
methods I l^ldUlSlIis i. The methods of this family ex¬ 
ploit transitive relations in the data, which makes them 
able to estimate the relationship between users and 
items that are not directly connected. Gori and Pucci 
Cl proposed ItemRank; a PageRank-inspired scoring 
algorithm that produces a personalized ranking vector 
using a random walk with restarts on an items’ correla¬ 
tion graph induced by the ratings. Fouss et al. Clil 
create a graph model of the RS database and they 
present a number of methods to compute node similar¬ 
ity measures, including the random walk-related aver¬ 
age Commute Time and average First Passage Time, 
as well as the pseudo-inverse of the graph’s Laplacian. 
They compare their methods against other state-of-the- 


art graph-based approaches such as, the sophisticated 
node similarity measure that integrates indirect paths 
in the graph, based on the matrix-forest theorem 
and a similarity measure based on the well known Katz 
algorithm |23]. 

Here, we attack the sparsity problem from a differ¬ 
ent perspective. The fact, that sparsity has been com¬ 
monly observed in models of seemingly unrelated nat¬ 
urally emerging systems, suggests an even more fun¬ 
damental cause behind this phenomenon. According to 
Herbert A. Simon, this inherent sparsity is intertwined 
with the structural organization and the evolutionary 
viability of these systems. In his seminal work on the 
architecture of complexity ll4^ . he argued that the ma¬ 
jority of sparse hierarchically structured systems share 
the property of having a Nearly Completely Decom¬ 
posable (NCD) architecture; they can be seen as com¬ 
prised of a hierarchy of interconnected blocks, sub¬ 
blocks and so on, in such a way that elements within 
any particular such block relate much more vigorously 
with each other than do elements belonging to differ¬ 
ent blocks, and this property holds between any two 
levels of the hierarchy. 

The analysis of decomposable systems has been pi¬ 
oneered by Simon and Ando i45l] who reported on 
state aggregation in linear models of economic sys¬ 
tems, but the universality and the versatility of Simon’s 
idea have permitted the theory to be used in many com¬ 
plex problems from diverse disciplines ranging from 
economics, cognitive theory and social sciences, to 
computer systems performance evaluation, data min¬ 
ing and information retrieval lMl2i30tf3lll3M . 

The criteria behind the decomposition vary with the 
goals of the study and the nature of the problem un¬ 
der consideration. For example, in the stochastic mod¬ 
eling literature, decomposability is usually found in 
the time domain and the blocks are defined to separate 
the short-term from the long-term temporal dynam- 























ics i ToP^ . In other cases the decomposition is cho¬ 
sen to highlight known structural properties of the un¬ 
derlying space; for example in the held of link analy¬ 
sis, many leading researchers have exploited the nearly 
decomposable structure of the Web, from a compu¬ 
tational (faster extraction of the PageRank vector) as 
well as a qualitative (generalization of the random 
surfer teleportation model) perspective 1 ^22133 1. 

In this worllil building on the intuition behind NCD, 
we decompose the item space into blocks, and we use 
these blocks to characterize the inter-item proximity in 
a macroscopic level. Central to our approach is the idea 
that blending together the direct with the indirect inter¬ 
item relations can help reduce the sensitivity to sparse¬ 
ness and improve the quality of recommendations. To 
this end, we propose NCDREC, a novel ranking based 
recommendation method which: 


- Provides a theoretical framework that enables the 
exploitation of item space’s innately decompos¬ 
able structure in an efficient, and scalable way. 

- Produces recommendations that outperform sev¬ 
eral state-of-the-art methods, in widely used met¬ 
rics (Section 13.21 ), achieving high quality results 
even in the generally harder task of recommend¬ 
ing long-tail items (Section [331) . 

- Displays low sensitivity to the problems caused 
by the sparsity of the underlying space and treats 
New Users more fairly; this is supported both by 
NCDREC’s theoretical properties (Section |2]2]4|) 
and our experimental hndings (Section [T4l l. 

The rest of the paper is organized as follows. In 
Section |2] after discussing briefly the intuition behind 
the exploitation of Decomposability for recommenda¬ 
tions, we introduce formally our model and we study 
several of its interesting theoretical properties (Sec¬ 
tion |2]2|. In Section |273] we present the NCDREC al¬ 
gorithm and we talk about its storage and computa¬ 
tional aspects. Our testing methodology and experi¬ 
mental results are presented in Section^ Finally, Sec¬ 
tion |4] concludes this paper and outlines directions for 
future work. 


’This work extents significantly our initial contribution fj^l . 
adding detailed presentation of the NCDREC model enriched by 
thorough explanations and examples, as well as rigorous theoretical 
analysis of its constituents parts. Furthermore, in this paper we pro¬ 
vide a more in-depth coverage of related literature including thor¬ 
ough discussions of the competing state-of-the-ai1 recommendation 
techniques as well as details regarding their implementation in our 
experiments. 


2. NCDREC Framework 

2.7. Exploiting Decomposability for 
Recommendations 


In the method we propose in this work, we see the 
set of items as a decomposable space and, following 
the modeling approach of a recently proposed Web 
ranking framework 1 33ll34 ]. we use the decomposition 
to characterize macro-relations between the elements 
of the dataset that can hopefully refine and augment the 
underlying collaborative Altering approach and “All 
in” some of the void left by the intrinsic sparsity of the 
data. The criteria behind the decomposition can vary 
with the particular aspects of the item space, the in¬ 
formation available etc. For example, if one wants to 
recommend hotels, the blocks may be defined to de¬ 
pict geographic information; in the movie recommen¬ 
dation problem, the blocks may correspond to the cat¬ 
egorization of movies into genres, or other movie at¬ 
tributes etc. To give our framework maximum flexibil¬ 
ity, we extend the notion to allow overlapping blocks; 
intuitively this seems to be particularly useful in many 
modeling approaches and recommendation problems. 

Before we proceed to the rigorous definition of 
the NCDREC framework, we outline briefly our ap¬ 
proach: First, we define a decomposition, V, of the 
item space into blocks and we introduce the notion of 
22-proximity, to characterize the implicit inter-level re¬ 
lations between the items. Then, we translate this prox¬ 
imity notion to suitably defined matrices that quantify 
these macroscopic inter-item relations under the prism 
of the chosen decomposition. These matrices need to 
be easily handleable in order for our method to be ap¬ 
plicable in realistic scenarios. Furthermore, their con¬ 
tribution to the final model needs to be weighted care¬ 
fully so as not to “overshadow” the pure collabora¬ 
tive filtering parts of the model. In achieving these, 
we follow an approach based on perturbing the stan¬ 
dard CF parts, using suitably defined low-rank matri¬ 
ces. Finally, to fight the inevitably extreme and local¬ 
ized sparsity related to cold start scenarios we create 
a Markov chain-based subcomponent, designed to in¬ 
crease the percentage of the item space covered by the 
produced recommendations, and we study the condi¬ 
tions (in terms of theoretical properties of the proposed 
decomposition) under which full item space coverage 
is guaranteed. 
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2.2. NCDREC Model and Theoretical Properties 

2.2.1. Notation 

All vectors are represented by bold lower case letters 
and they are column vectors (e.g., u>). All matrices are 
represented by bold upper case letters (e.g., W). The 
i* row and column of matrix W are denoted wj 
and Wj, respectively. The element of matrix W is 
denoted [W]y . We use diag(u;) to denote the matrix 
having vector u) on its diagonal, and zeros elsewhere. 
We use calligraphic letters to denote sets (e.g., U, V). 
Finally, symbol = is used in definition statements. 

2.2.2. Definitions 

Let U — {ui,... ,Un} be a set of users, V = 
{ui,..., Um} a set of items and TZ a set of tuples 

= ( 1 ) 

where Vij is a nonnegative number referred to as the 
rating given by user Ui to the item vj. For each user in 
14 we assume he has rated at least one item; similarly 
each item in V is assumed to have been rated by at least 
one user. 

We define an associated user-item rating matrix 
R G whose zj*^ element equals , if Uj € TZ, 

and zero otherwise. For each user Ui, we denote TZi the 
set of items rated by Ui in TZ, and we define a pref¬ 
erence vector LJ = [oji,..., oom], whose nonzero el¬ 
ements contain the user’s ratings that are included in 
TZi, normalized to sum to one. 

We consider an indexed family of non-empty sets 

(2) 

that defines a ^-decomposition of the underlying 
space V, such that V = Ua^i Each set Dj is re¬ 
ferred to as a 2?-block, and its elements are considered 
related according to some criterion. 

We define 

S. = \J T^k (3) 

to be the proximal set of items of z; G V, i.e. the union 
of the 2?-blocks that contain v. We use to denote 
the number of different blocks in 2)„, and 

: ink > 0) A (vk G T’r)}| (4) 

for the number of items rated by user ui that belong to 
the 2?-block, D^. Every ^-decomposition is also asso¬ 


ciated with an undirected graph 

Gt) — (Vp, Sv) 

Its vertices correspond to the 2?-blocks, and an edge 
between two vertices exists whenever the intersection 
of these blocks is a non-empty set. This graph is re¬ 
ferred to as the block coupling graph for the D- 
decomposition. 

Finally, with every ^-decomposition we associate 
an Aggregation matrix Ap G whose jkf^ 

element is 1, if Vj G Dk and zero otherwise. 

2.2.3. Main Component 

The pursuit of ranking-based recommendations, 
grants us the flexibility of not caring about the exact 
recommendation scores; only the correct item ordering 
is needed. This allows us to manipulate the missing 
values of the rating matrix in an “informed” way so as 
to introduce some preliminary ordering based on the 
user’s expressed opinions about some items, and the 
way these items relate with the rest of the item space. 

The existence of such connections is rooted in the 
idea that a user’s rating, except for expressing his di¬ 
rect opinion about a particular item, also gives a clue 
about his opinion regarding the proximal set of this 
item. So, “propagating” these opinions through the de¬ 
composition to the many related elements of the item 
space, can hopefully refine the estimation of his pref¬ 
erences regarding the vast fraction of the item set for 
which he has not expressed opinions, and introduce an 
ordering between the zeros in the rating matrix, that 
will hopefully relieve sparsity related problems. 

Having this in mind, we perturb the user-item rating 
matrix R, with an NCD preferences matrix W that 
propagates the expressed user opinions about particu¬ 
lar items to the proximal sets. The resulting matrix is 
given by: 

G ^ R -f eW, (6) 

where e is a positive parameter, chosen small so as not 
to “eclipse” the actual ratings. The NCD preferences 
matrix is formally defined below; 

NCD Preferences Matrix W. The NCD preferences 
matrix, is defined to propagate each user’s ratings to 
the many related elements (in the D-decomposition 
sense) of the item space. Formally, matrix W is de¬ 
fined as follows; 

W = ZXT 


(7) 



where matrix X denotes the row normalized version 
of Ax>, and the element of matrix Z equals 
when > 0, and zero otherwise. 

The final recommendation vectors are produced by 
projecting the perturbed data onto an /-dimensional 
space. In particular, the final recommendation vectors 
are defined to be the rows of matrix 

n^UfSfVj, (8) 

where matrix Sf S is a diagonal matrix con¬ 

taining the first / singular values of G, and matrices 
Uf G and Vf G are orthonormal matri¬ 

ces containing the corresponding left and right singular 
vectors. 

Remark 1. In fact, the recommendation vectors pro¬ 
duced by Eq. (|8ll can be seen as arising from a low di¬ 
mensional eigenspace of an NCDaware inter-item sim¬ 
ilarity matrix. We discuss this further in AppendixiAl 

2.2.4. ColdStart Component 

In some cases the sparsity phenomenon becomes so 
intense and localized that the perturbation of the rat¬ 
ings through matrix W is not enough. Take for exam¬ 
ple newly emerging users in an existing recommender 
system. Naturally, because these users are new, the 
number of ratings they introduce in the RS is usually 
not sufficient to be able to make reliable recommenda¬ 
tions. If one takes into account only their direct inter¬ 
actions with the items, the recommendations to these 
newly added users are very likely to be restricted in 
small subsets of V, leaving the majority of the item 
space uncovered. 

To address this problem which represents one of the 
continuing difficulties faced by recommender systems 
in operation ||3l, we create a COLDStart subcompo¬ 
nent based on a discrete Markov chain model over the 
item space with transition probability matrix S, de¬ 
fined to bring together the direct as well as the de¬ 
composable structure of the underlying space. Matrix 
S is defined to consist of three components, namely a 
rank-one preference matrix etu''' that rises from the 
explicit ratings of the user as presented in the training 
set; a direct proximity matrix H, that depicts the di¬ 
rect inter-item relations; and an NCD proximity ma¬ 
trix D that relates every item with its proximal sets. 
Concretely, matrix S is given by; 

S ^ (1 - ajetpT-p a(/3H-f (1 -/3)D) (9) 

with a and /3 being positive real numbers for which 
a, /3 < 1 holds. Parameter a controls how frequently 


the Markov chain “restarts” to the preference vector, 
tu, whereas parameter (3 weights the involvement of 
the Direct and the NCD Proximity matrices in the final 
Markov chain model. The personalized ranking vector 
for each newly added user is defined to be the station¬ 
ary probability distribution of the Markov chain that 
corresponds to the stochastic matrix S, using the nor¬ 
malized ratings of the user as the initial distribution. 

Direct Proximity Matrix H. The direct proximity ma¬ 
trix H is designed to capture the direct relations 
between the elements of V. Generally, every such 
element will be associated with a discrete distri¬ 
bution = [hi,h 2 ,--- ,hm] over V, that re¬ 
flects the correlation between these elements. In 
our case, we use the stochastic matrix defined as 
follows: 

H ^ diag(Ce)-iC (10) 

where C is an m x m matrix whose element 
is defined to be [C]ij = rlrj for i ^ j, zero 
otherwise, and e is a properly sized unit vector. 
NCD Proximity Matrix D. The NCD proximity ma¬ 
trix D is created to depict the interlevel connec¬ 
tions between the elements of the item space. In 
particular, each row of matrix D denotes a prob¬ 
ability vector dt,, that distributes evenly its mass 
between the blocks of Dy, and then, uni¬ 
formly to the included items of each block. For¬ 
mally, matrix D is defined by: 

D = XY (11) 

where X, Y denote the row normalized versions 
of A-jy and respectively. 

Lemma 1. Matrices H, D are well defined row 
stochastic matrices. 

Proof. We will begin with matrix H. First, notice 
that for matrix H to be well defined it is necessary 
diag(Ce) to be invertible. But this is assured by our 
model’s assumption that every item have been rated by 
at least one user. Indeed, when this assumption holds, 
every row of matrix C denotes a non-zero vector in 
91™, thus Ce denotes a vector of strictly positive ele¬ 
ments, which makes the diagonal matrix diag(Ce) in¬ 
vertible, as needed. 

For matrix D it suffices to show that for any V- 
decomposition, every column and every row of the cor¬ 
responding aggregation matrix A®, denote non-zero 
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vectors in 93™ and 93^ respectively. The latter is en¬ 
sured from the fact that NCD blocks are defined to 
be non-empty, whereas the former condition holds be¬ 
cause the union of the 2?-blocks denote a cover of the 
itemspace. □ 

Example 1. To clarify the definition of the NCD ma¬ 
trices W,D, we give the following example. Consider 
a simple movie recommendation system consisting of 
an itemspace of 8 movies and a userspace of 10 users 
each having rated at least one movie. Let the set of rat¬ 
ings, TZ, be the one presented below; 


The corresponding aggregation matrix Ap G 
is 


A-d 


/ 100 \ 
1 0 1 
0 1 0 
1 1 0 
0 1 1 
00 1 
001 
Voiiy 


(14) 


7^ = 




{U4,VI,1), (m7,Ui, 4), (M8,t^l,5), 

(uio,-Ul,5), {u5,V2,5), (mi,U3,4), 
(m2,W3,5), {us,V3,2), (m9,U3,2), 
(■U10,W3,5), {u3,V4,2), (m4,U4,5), 

(m5,U4,4), iu9,V4,l), {ui,V5,l), 

(m5,U5,5), {ue,V5,5), (ur,V5,3), 
(u3,V6,3), (uio,V 6,5), (U3,vr,l), 
(u3,V8,5), (u6,V8,5), (u8,V8,5), 


\ 


( 12 ) 


Assume also that the 8 movies of the itemspace be¬ 
long to 3 genres as seen below: 
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Following the definition of matrix W we get the ma¬ 
trix shown in Figure|2] For the factor matrices Z, X we 
have: 
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Similarly, in Figurej^we give the detailed computa¬ 
tion of the inter-item NCD Proximity matrix D of the 
ColdStart component. 

2.2.5. Theoretical Properties of the ColdStart 
Subcomponent 

Informally, the introduction of the NCD proximity 
matrix D, helps the item space become more “con- 




























Fig. 3. We see the matrix D that corresponds to Example[T] We 
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nected”, allowing the recommender to reach more 
items even for the set of newly added users. When the 
blocks are overlapping this effect becomes stronger, 
and in fact, item space coverage can be guaranteed un¬ 
der certain conditions. 

Theorem 1 (ItemSpace Coverage). If the block cou¬ 
pling graph Qx> is connected, there exists a unique 
steady state distribution tt of the Markov chain cor¬ 
responding to matrix S that depends on the prefer¬ 
ence vector us; however, irrespectively of any particu¬ 
lar such vector, the support of this distribution includes 
every item of the underlying space. 

Proof Before we proceed to the actual proof, we 
will give a small sketch: When Qjy is connected, the 
Markov chain induced by the stochastic matrix S con¬ 
sists of a single irreducible and aperiodic closed set 
of states, that includes all the items. To prove the ir- 
reducibility part, we will show that the NCD proxim¬ 
ity stochastic matrix, that corresponds to a connected 
block coupling graph, ensures that starting from any 
particular state of the chain, there is a positive prob¬ 
ability of reaching every other state. For the aperiod- 
icity part we will show that matrix D makes it possi¬ 
ble, for the Markov chain to return to any given state 
in consecutive time epochs. The above is true for every 
stochastic vector a;, and for every positive real num¬ 
bers a, /3 < 1. 

Lemma 2. The connectivity of Qv implies the irre- 
ducibility of the Markov chain with transition proba¬ 
bility matrix D. 

Proof From the decomposition theorem of Markov 
chains we know that the state space S can be parti¬ 


tioned uniquely as 

S = T U Cl U C 2 U • • • 


(15) 


where T is the set of transient states, and the Ci are 
irreducible closed sets of persistent states iS. 

Furthermore, since S is finite at least one state is per- 
sistent and all persistent states are non-null (see il9|] . 
page 225). We will prove that the connectivity of Qd 
alone, ensures that starting from this state i, we can 
visit every other state of the Markov chain. In other 
words, the connectivity of Qd implies that T = it) and 
there exists only one irreducible closed set of persistent 
states. 

Assume, for the sake of contradiction, that Qd is 
connected and there exists a state j outside the set C. 
This, by definition, means that there exists no path that 
starts in state i and ends in state j. 

Flere we will show that when Qd is connected, it 
is always possible to construct such a path. Let Vi be 
the item corresponding to state i and Vj the item corre¬ 
sponding to state j. Let the proximal set of items 
of Vi- We must have one of the following cases: 

Vj G 'iDvi- 111 case, the states are directly con¬ 
nected, and Prjnext is j|we are in i} equals: 


= 


1 




(16) 


which can be seen by Eq. (fTTl l together with the 
definitions of Section l2.2.2l 
Vj ^ Dvi- In this case, the states are not directly 
connected. Let be a 2?-block that contains Vj, 
and Vy. a 2?-block that contains Vi. Notice that 



















Vj ^ 'Dy^ implies that Vy. fl Vy^ = 0. How¬ 
ever, since Q-jj is assumed connected, there exists 
a sequence of vertices corresponding to 2?-blocks, 
that forms a path in the block coupling graph be¬ 
tween nodes Vy. and Vy^. Let this sequence be 
the one below: 


Vv.,Vi,V2,...,Vn,Vv^ (17) 

Then, choosing arbitrarily one state that corre¬ 
sponds to an item belonging to each of the V- 
blocks of the above sequence, we get the se¬ 
quence of states: 

(18) 

which corresponds to the sequence of items 

(19) 

Notice that the dehnition of the T>-blocks together 
with the dehnitions of the proximal sets and the 
block coupling graph, imply that this sequence 
has the property every item, after Vi, to belong to 
the proximal set of the item preceding it, i.e. 

€ '^y^ , Vt2 ^ 5 ■ • ■ ; ^ ( 711 ) 

Thus, the consecutive states in sequence (fTsT i 
communicate, or 


Proof. It is known that the period of a state i is dehned 
as the greatest common divisor of the epochs at which 
a return to the state is possible lll9n . Thus, it suffices to 
show that we can return to any given state in consecu¬ 
tive time epochs. But this can be seen readily because 
the diagonal elements of matrix D are by dehnition, all 
greater than zero; thus, for any state and for every pos¬ 
sible trajectory of the Markov chain of length k there 
is another one of length k + 1 with the same starting 
and ending state, that follows the self loop as its h- 
nal step. In other words, leaving any given state of the 
corresponding Markov chain, one can always return in 
consecutive time epochs, which makes the chain ape¬ 
riodic. And the proof is complete. □ 


We have shown so far that the connectivity of Qx> 
results is enough to ensure the irreducibility and aperi- 
odicity of the Markov chain with transition probability 
matrix D. 

It remains now to prove that the same thing holds 
for the complete stochastic matrix S. This can be done 
using the following useful lemma, the proof of which 
can be found in the Appendix iBl 

Lemma 4. If A is the transition matrix of an irre¬ 
ducible and aperiodic Markov chain with finite state 
space, and B the transition matrix of any Markov 
chain defined onto the same state space, then matrix 
C = kA+AB, where k, A > 0 such that k-I- A = 1 de¬ 
notes the transition matrix of an irreducible and ape¬ 
riodic Markov chain also. 


i — ^ fl — ^ t2 —y ■ * * —y tyi —y j (21) 


Applying Lemma|4]twice, hrst to matrix: 


and there exists a positive probability path be¬ 
tween states i and j. 

In concussion, when Qd is connected there will al¬ 
ways be a path starting from state i and ending in state 
j. But because state i is persistent, and belongs to the 
irreducible closed set of states C, state j belongs to the 
same irreducible closed set of states too. This contra¬ 
dicts our assumption. Thus, when Qd is connected ev¬ 
ery state belongs to a single irreducible closed set of 
states, C. □ 

Now it remains to prove the aperiodicity property. 

Lemma 3. The Markov chain induced by matrix D is 
aperiodic. 


T = /3H -f (1 - /3)D (22) 

and then to matrix: 

S = (1 - a)ea;T -f aT (23) 

gives us the irreducibility and the aperiodicity of ma¬ 
trix S. Taking into account the fact that the state 
space is hnite, the resulting Markov chain becomes er- 
godic iH and there exists a unique recommendation 
vector corresponding to its steady state probability dis¬ 
tribution which is given by 

TT = [7ri7r2 • • • TTm] = [—^^- —] (24) 

Pi p2 pm 



where fj,i is the mean recurrence time of state i. How¬ 
ever, for ergodic states, by definition it holds that 

1 < fj-i < oo (25) 

Thus TTi > 0, for all i, and the support of the distribu¬ 
tion that defines the recommendation vector includes 
every item of the underlying space. 

□ 

The above theorem suggests that even for a user who 
have rated only one item, when the chosen decom¬ 
position enjoys the above property, our recommender 
finds a way to assign preference probabilities for the 
complete item space. Note that the criterion for this 
to be true is not that restrictive. For example for the 
MovieLens datasets, using as a criterion of decom¬ 
position the categorization of movies into genres, the 
block coupling graph is connected. This, proves to be 
a very useful property, in dealing with the cold-start 
problem as we will see in the experimental evaluation 
presented in SectionjTj] 

2.3. NCDREC Algorithm: Storage and 
Computational Issues 

It is clear that for the majority of reasonable de¬ 
compositions the number of blocks is much smaller 
than the cardinality of the item space, i.e. iT <C m; 
this makes matrices D and W, extremely low-rank. 
Thus, if we take into account the inherent sparsity of 
the ratings matrix R, and of the component matrices 
X, Y, Z, we see that the storage needs of NCDREC 
are in fact modest. 

Furthermore, the fact that matrices G and S can be 
expressed as a sum of sparse and low-rank compo¬ 
nents, can also be exploited computationally as we see 
in the NCDREC algorithm presented above. Our algo¬ 
rithm makes sure that the computation of the recom¬ 
mendation vectors can be carried out without needing 
to explicitly compute matrices G and S. 

The computation of the singular triplets is based on 
a fast partial SVD method proposed by Baglama and 
Reighel in ||3l. However, because their method presup¬ 
poses the existence of the final matrix, we modified 
the partial Lanczos bidiagonalization iterative proce¬ 
dure to take advantage of the factorization of the NCD 
preferences matrix W into matrices X, Z. The detailed 
computation is presented in the NCD_PartialLBD 
procedure in Algorithm [T] For the computation of the 
newly added users’ recommendations, we collect their 


preference vectors in an extremely sparse matrix fl, 
and we compute their stationary distributions using 
a batch power method approach exploiting matrices 
X, Y. Notice that the exploitation of the factorization 
of the NCD matrices in both procedures results in a 
significant drop of the number of floating point oper¬ 
ations per iteration, since every dense Matrix x Vector 
(MV) multiplication, is now replaced by a sum of 
lower dimensional and sparse MV’s, making the over¬ 
all method significantly faster. 

3. Experimental Evaluation 

In order to evaluate the performance of NCDREC 
in recommending top-N lists of items, we run a 
number of experiments using two real datasets: the 
Yahoo ! R2Music, which represents a real snapshot 
of the Yahoo! Music community’s preferences for var¬ 
ious songs, and the standard MovieLens (IM and 
lOOK) datasets. These datasets also come with infor¬ 
mation that relates the items to genres; this was chosen 
as the criterion of decomposition behind the definition 
of matrices D and W. For further details about these 
datasets see http : / /webscope . sandbox . yahoo . com 
and http://grouplens.org/ A synopsis of 
their basic characteristics is presented in Table [T] 

Exploiting meta-information is a very useful weapon 
in alleviating sparsity related problems 13. Thus, in 
order to provide fair comparisons we test our method 
against recommendation methods that: 

(a) can also take advantage of the categorization of 
items to genres and, 

(b) are known to show lower sensitivity to the prob¬ 
lems of limited coverage and sparsity 3. 

In particular, we run NCDREC0 against five state-of- 
the-art graph-based approaches; the node similarity al¬ 
gorithms L^, and Katz; the random walk approaches 

Eirst Passage Time (FP) and Commute Time (CT) 
and the Matrix Forest Algorithm (MFA). 

3.1. Competing Recommendation Methods 

The data model used for all the competing meth¬ 
ods is a graph representation of the recommender 


^The perturbation parameter e was set to 0.01, the number of la- 
tent factors was selected from the range 2 to 800, and the COLD- 
Start subcomponent parameters were chosen to be a = 0.01 and 
P = 0.75. 



Algorithm 1 NCDREC Algorithm 

Input: Matrices R S H e e 

^mxK^Y e e Parameters 

a,l3,f,e 

Output: The matrix with recommendation vectors 
for every user, 11 G 

Step 1: Find the newly added users and collect their 
preference vectors into matrix O. 

Step 2: Compute Ilspaise using the ColdStart pro¬ 
cedure. 

Step 3: Initialize vector pi to be a random unit length 
vector. 

Step 4: Compute the modified Lanczos procedure up 
to step M, using NCD_PartialLBD with starting 
vector pi. 

Step 5: Compute the SVD of the bidiagonal matrix 
B to extract f < M approximate singular triplets: 


{uj,crj,Vj} ^ {Qu] 


(B) (B) 


O'a 


,Pv/ 


Step 6: Orthogonalize against the approximate sin¬ 
gular vectors to get a new starting vector pi. 

Step 7: Continue the Lanczos procedure for M more 
steps using the new starting vector. 

Step 8: Check for convergence tolerance. If met com¬ 
pute matrix rifun = USV''' else go to Step 4. 

Step 9: Update Hfuu, replacing the rows that corre¬ 
spond to new users with Ilsparse- 
return Ilfuii 


procedure NCD_PartialLBD(R, X, Z, pi, e) 

4> ^ XTpi; ^ Rpi -p eZcf)-, 

^1,1 ^ lliilb ; ui ^ qi/&i,i; 

for j = 1 to M do 

r ^ R^Oj -f eX</> — bjjpy, 
r^r-[pi...pj]([pi...pj]Tr); 
if j < M then 

^ Ikih Pj+1 ^ ^/bj,j+l', 

(p ^ XTpj+i; 

Qj+i ^ R-Pj-ri + — hjj+iqj; 

qj+i ^ qj+i-[qi • • • qj] ([qi ■ • • qjl'^qj+i 

^ llqj-i-ill; 

qj+i ^ qj+i/^j-i-i,i-i-i; 

end if 
end for 

end procedure 

procedure ColdStart(H, X, Y, O, a,l3) 
n ^ O; fc ^ 0; r ^ 1; 
while r > tol and k < maxit do 
fc ^ fc -P 1; 

n ^ apim, $ ^ nx; 

n ^ n -P a(l - /3)$Y -P (1 - a)fl; 
r ^ ||n- n||; n ^ n; 

end while 

return Ilsparse ^ n 
end procedure 


system database. Concretely, consider a weighted 
graph G with nodes corresponding to database el¬ 
ements and database links corresponding to edges. 
For example, in the MovieLens datasets each el¬ 
ement of the people set, the movie set, and the 
movie_category set, corresponds to a node of the 
graph, and each has_watched and belongs_to 
link is expressed as an edge ifllll . 

Generally speaking, graph-based recommendation 
methods work by computing similarity measures be¬ 
tween every element in the recommender database and 
then using these measures to compute ranked lists of 
the items with respect to each user. 


The pseudoinverse of the graph’s Laplacian fLfj. 
This matrix contains the inner products of the node 
vectors in a Euclidean space where the nodes are ex¬ 
actly separated by the commute time distance OlSll . For 


the computation of the matrix we used the formula: 

L'l' = (L--- 77^^^ 

n-\-m + K n-\-m-\-K 

(26) 

where L is the Laplacian of the graph model of the 
recommender system, n, the number of users, m, 
the number of items, and K, the number of blocks 
(see Hi for details). 

The MFA similarity matrix M. MFA matrix con¬ 
tains elements that also provide similarity measures 
between nodes of the graph by integrating indirect 
paths, based on the matrix-forest theorem Igt]. Matrix 
M was computed by 

M = (I-f L)”^ 


(27) 









Table 1 
Datasets 


Dataset 

#Users 

#Items 

#Ratings 

Density 

MovieLenslOOK 

943 

1682 

100,000 

6.30% 

MovieLenslM 

6,040 

3,883 

1,000,209 

4.26% 

Yahoo!R2Music 

1,823,179 

136,736 

717,872,016 

0.29% 


where I, the identity matrix. 

The Katz similarity matrix K. Katz similarity matrix 
is computed by 

K = aA + + • • • = (I - aA)“^ - I (28) 


where A is the adjacency matrix of the graph and a 
measures the attenuation in a link (see 12311 ). 


Average First Passage Times. The Average First Pas¬ 
sage Time scores are computed by iteratively solving 
the recurrence 


rFP(A:|fc) =0 

\ FP(fc|*) = 1 + FP(fc|j), for * ^ k 

(29) 

where pij is the conditional probability a random 
walker in the graph G, visits node j next, given that he 
is currently in node i. 

Average Commute Times. Finally, Average Commute 
Times scores can be obtained in terms of the Average 
First-Passage Times by: 


For this evaluation, except for the standard Recall 


and Precision metrics 14111 111 , we also use a number 
of other well known ranking measures, which discount 
the utility of recommended items depen ding on their 
position in the recommendation list 1 51^ 1: namely 
the R-Score, the Normalized Discounted Cumula¬ 
tive Gain and the Mean Reciprocal Rank metrics. R- 
Score assumes that the value of recommendations de¬ 
cline exponentially fast to yield for each user the fol¬ 
lowing score: 


R(a) = ^ 

9 


max(?/ 

2 



d,0) 


(31) 


where a is a half-life parameter which controls the ex¬ 
ponential decline, tt^ is the index of the item in the 
recommendation ranking list tt, and y is a vector of 
the relevance values for a sequence of items. In Cumu¬ 
lative Discounted Gain the ranking positions are dis¬ 
counted logarithmically and is defined as: 


DCG@fc(y,7r) 


A - 1 

^ log 2(2 -f q) 


(32) 


CT(z,j)=FP(i|j)+FP(jK) 


(30) 


The Normalized Discounted Cumulative Gain can then 
be defined as: 


For further details about the competing algorithms 
see I Tslfldll^lEB I and the references therein. 


NDCG@fc = 


DCG@fc(y,7r) 

DCG@fc(y,7r*) 


(33) 


3.2. Quality of Recommendation 

To evaluate the quality of our method in suggesting 
top-N items, we have adopted the methodology used 
in im. In particular, we randomly sampled 1.4% of 
the ratings of the dataset in order to create a probe set 
V, and we use each item vj, rated with 5-star by user 
Ui in V to form the test set T. Finally, for each item 
in T, we randomly select another 1000 unrated items 
of the same user and we rank the 1001 item lists using 
the different methods mentioned and we evaluate the 
quality of recommendations. 


where, tt* is the best possible ordering of the items 
with respect to the relevant scores (see for details). 
Finally, Mean Reciprocal Rank (MRR) is the average 
of each user’s reciprocal rank score, defined as follows: 


RR = 


1 

ming{q : > 0} 


(34) 


MRR decays more slowly than R-Score but faster than 
NDCG. 

Figure |4] reports the performance of the algorithms 
on the Recall, Precision and NDCG metrics. In par- 
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Fig. 4. Recommendation quality on MovieLenslM and Yahoo ! R2Music datasets using Recall@N, Precision and NDCG@N metrics 































ticular, we report the average Recall as a function of 
N (focusing on the range N = [1,..., 20]), the Pre¬ 
cision at a given Recall, and the NDCG@A^, for the 
MovieLenslM (1st column) and Yahoo ! R2Music 
(2nd column) datasets. As we can see NCDREC out¬ 
performs all other methods, reaching for example at 
N = 10, a Recall around 0.53 on MovieLens and 
0.45 on the sparser Yahoo ! R2Music dataset. Sim¬ 
ilar behavior is observed for the Precision and the 
NDCG metrics as well. Table|2|presents the results for 
the R-Score (with halflife parameters 5 and 10) and the 
MRR metrics. Again we see that NCDREC achieves 
the best results with MEA and Lf doing significantly 
better than the other graph-based approaches in the 
sparser dataset. 

Einally, for completeness, we also run NCDREC 
on the standard MovieLenslOOK dataset using the 
publicly available 5 predefined splittings into train¬ 
ing and test sets. Here, we use the Degree of Agree¬ 
ment metric (a variant of Somer’s D statistic!!, that 
have been used by many authors for the perfor¬ 
mance evaluation of ranking-based recommendations 
on MovieLenslOOK) in order to allow direct com¬ 
parisons with the different results to be found in the 
literature lfiMIl2i5l . 

NCDREC obtained a macro-averaged DOA score of 
92.25 and a micro-averaged DOA of 90.74 which is 
- to the best of our knowledge - the highest scores 
achieved thus far on this benchmark dataset. 


3.3. Long-Tail Recommendation 


It is well known that the distribution of rated items 
in recommender systems is long-tailed, i.e. the major¬ 
ity of the ratings is concentrated in a few very popu¬ 
lar items. Of course, recommending popular items is 
generally considered an easy task and adds very little 
utility in recommender systems. On the other hand, the 
task of recommending long-tail items adds novelty and 
serendipity to the users ll111 , and it is also known to 
increase the profits of e-commence companies signif¬ 
icantly 1 ^5011 . The inherent sparsity of the data how¬ 
ever - which is magnified even more for long tail items 
- presents a major challenge for most state-of-the-art 
collaborative filtering methods. 

In order to evaluate NCDREC in recommending 
long-tail items we adopt the methodology described 
in lull . In particular, we order the items according to 


their popularity (the popularity was measured in terms 
of number of ratings) and we further partition the test 
set T into two subsets, Thead and Ttaih that involve 
items originated from the short head, and the long tail 
of the distribution, respectively. We discard the popu¬ 
lar items and we evaluate NCDREC and the other al¬ 
gorithms on the Ttaii test set, using the procedure ex¬ 
plained in the previous section. Eigure |2 and Table |3] 
report the results. 

We see that NCDREC achieves again the best re¬ 
sults, managing to retain its performance in all met¬ 
rics and for both datasets. Notice here the significant 
drop in quality of the random walk based methods, 
which were found to behave very well in the standard 
recommendation scenario. This finding indicates their 
bias in recommending popular items. MEA and Lf on 
the other hand, do particularly well, exhibiting great 
ability in uncovering non-trivial relations between the 
items, especially in the sparser Yahoo! R2Music 
dataset. 

3.4. Recommendations for Newly Emerging Users 

One very common manifestation of sparsity faced 
by real recommender systems is the New-Users Prob¬ 
lem. This problem refers to the difficulty of achieving 
reliable recommendations for newly emerging users in 
an existing recommender system, due to the de facto 
initial lack of personalized feedback. This problem can 
also be seen as an extreme and localized expression of 
sparsity, that prohibits CE methods to uncover mean¬ 
ingful relations between the set of new users and the 
rest of the RS database, and thus, undermines the reli¬ 
ability of the produced recommendations. 

To evaluate the performance of our method in 
coping with this problem we run the following ex¬ 
periment. We randomly select 100 users from the 
MovieLenslM dataset having rated 100 movies or 
more and we randomly select to include 4%, 6%, 8%, 
10% of their ratings in new artificially “sparsified” ver¬ 
sions of the dataset. The idea is that the modified data 
represent “earlier snapshots” of the system, when these 
users were new, and as such, had rated fewer items. 
We run NCDREC0 against the other methods, and we 
compare the recommendation vectors with the ranking 
lists induced by the complete set of ratings, which we 
use as the reference ranking for each user. 


^We give a detailed definition of the DOA metric in Section [T4l 
where we also present other ranking stability metrics. 


^Note that the ranking list for the set of newly added users was 
produced by the ColdStart subcomponent. 
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Fig. 5. Long tail recommendation quality on MovieLenslM and Yahoo ! R2Music datasets using Recall@N, Precision and NDCG@N metrics 



































Table 2 

Recommendation quality on MovieLenslM and Yahoo ! R2Music datasets using R-Score and MRR metrics 



MovieLenslM 

Yahoo!R2Music 

R(5) 

R(10) 

MRR 

R(5) 

R(10) 

MRR 

NCDREC 

0.3997 

0.5098 

0.3008 

0.3539 

0.4587 

0.2647 

MFA 

0.1217 

0.1911 

0.0887 

0.2017 

0.2875 

0.1591 

L] 

0.1216 

0.1914 

0.0892 

0.1965 

0.2814 

0.1546 

FP 

0.2054 

0.2874 

0.1524 

0.1446 

0.2241 

0.0998 

Katz 

0.2187 

0.3020 

0.1642 

0.1704 

0.2529 

0.1203 

CT 

0.2070 

0.2896 

0.1535 

0.1465 

0.2293 

0.1019 


Table 3 

Long tail recommendation quality on MovieLenslM and Yahoo ! R2Music datasets using R-Score and MRR metrics 

MovieLenslM Yahoo !R2Muslc 



R(5) 

R(10) 

MRR 

R(5) 

R(10) 

MRR 

NCDREC 

0.3279 

0.4376 

0.2395 

0.3520 

0.4322 

0.2834 

MFA 

0.1660 

0.2517 

0.1188 

0.2556 

0.3530 

0.1995 

L] 

0.1654 

0.2507 

0.1193 

0.2492 

0.3461 

0.1939 

FP 

0.0183 

0.0654 

0.0221 

0.0195 

0.0684 

0.0224 

Katz 

0.0275 

0.0822 

0.0267 

0.0349 

0.0939 

0.0309 

CT 

0.0192 

0.0675 

0.0227 

0.0215 

0.0747 

0.0249 


For this comparison except for the standard Spear¬ 
man’s p and Kendall’s t metrics 1 31^ 1. we also 
use two other well known ranking measures, namely 
the Degree of Agreement (DOA) I TsIllhlllS I and the 
Normalized Distance-based Performance Measure 
(NDPM) 1^1, outlined below. Table |4] contains all the 
necessary definitions. 


Kendall’s t is an intuitive nonparametric rank corre¬ 
lation index that has been widely used in the lit¬ 
erature. The T of ranking lists r®, tt® is defined to 
be; 


C-D 


(35) 


and takes the value of 1 for perfect match and -1 
for reversed ordering. 

Spearman’s p is another widely used non-parametric 
measure of rank correlation. The p of ranking lists 


r®, 7r® is defined to be; 


y (r* -PifTT® -7f®) 

P= - , , jx - (36) 


m 


a 


y)a{ 


TT 


where the “ and a{-) denote the mean and stan¬ 
dard deviation. The p takes values from -1 to 1. 
A p of 1 indicates perfect rank association, a p of 
zero indicates no association between the ranking 
lists and a p of -1 indicate a perfect negative asso¬ 
ciation of the rankings. 

Degree of Agreement (DOA) is a performance index 
commonly used in the recommendation litera¬ 
ture to evaluate the quality of ranking-based CF 
methods iflsIfThlfll^ DOA is a variant of the 
Somers’ D statistic 04^ . defined as follows; 




%\*\ (AUTI) 


(37) 


where [S'] equals 1, if statement S is true and zero 
otherwise. Macro-averaged DOA (macro-DOA) 


























Table 4 

A summary of the notation used for the definition of the ranking stability metrics 


Notation Meaning 

r* User’s Ui reference ranking 

7r* Recommender System generated ranking 

Ranking score of the item Vj in user’s Ui ranking list (reference ranking) 

Ranking score of the item Vj in user’s Ui ranking list (Recommender System generated ranking) 
C Number of pairs that are concordant 
D Number of discordant pairs 
N Total number of pairs 

Tj. Number of tied pairs in the reference ranking 
Ttt Number of tied pairs in the system ranking 

X Number of pairs where the reference ranking does not tie, but the RS’s 
ranking ties {N — Tr — C — D) 


is the average of all DOAi and micro-averaged 
DOA (micro-DOA) is the ratio between the ag¬ 
gregate number of item pairs in the correct order 
and the total number of item pairs checked (for 
further details see I fslflh l). 

Normalized Distance-based Performance Measure 
The NDPM of ranking lists r*, tt* is defined to 
be; 


NDPM = 


D + 0.5X 
N -Tr 


(38) 


The NDPM measure gives a perfect score of 0 to 
RS that correctly predict every preference relation 
asserted by the reference. The worst score of 1 is 
assigned to recommendation vectors that contra¬ 
dict every preference relation in r* 1 4^1147 ]. 


High scores on the first three metrics (p, r, DOA) 
and low score on the last (NDPM), suggest that the 
two ranking lists 10 are “close”, which means that the 
new users are more likely to receive recommendations 
closer to their tastes as described by their full set of 
ratings. 

In Figure |6] we report the average scores on all 
four metrics, for the set of newly added users. We see 
that NCDREC clearly outperforms every other method 
considered, achieving good results even when only 4% 
of each user’s ratings were included. MFA and L] 
also do well, especially as the number of ratings in¬ 
creases. These results are in accordance with the intu¬ 
ition behind our approach and the theoretical proper¬ 
ties of the COLDStart subcomponent. We see that. 


even though new users’ tastes are not yet clear, the ex¬ 
ploitation of NCD proximity captured by matrix D, 
manages to “propagate” this scarce rating information 
to the many related elements of the item space, giv¬ 
ing our method an advantage in uncovering new users’ 
preferences. This leads to a recommendation vector 
exhibiting lower sensitivity to sparsity. 


4. Conclusions and Future Work 


In this work we proposed NCDREC; a novel method 
that builds on the intuition behind Decomposabil- 
ity to provide an elegant and computationally ef¬ 
ficient framework for generating recommendations. 
NCDREC exploits the innately hierarchical structure 
of the item space, introducing the notion of NCD prox¬ 
imity, which characterizes inter-level relations between 
the elements of the system and gives our model useful 
antisparsity theoretical properties. 

One very interesting direction we are currently pur¬ 
suing involves the generalization of the COLDStart 
subcomponent exploiting the functional rankings fam¬ 
ily Jl. In particular, based on a recently proposed, 
multidamping reformulation of these rankings 1 2^1^ 
that allows intuitive and fruitful interpretations of the 
damping functions in terms random surfing habits, one 
could try to capture the actual newly emerging users’ 
behavior as they begin to explore the recommender 
system, and map it to suitable collections of person¬ 
alized damping factors that could lead to even better 
recommendations. Another interesting research path 
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Fig. 6. Recommendation performance for New Users problem 





































































































































































































































































































that remains to be explored involves the introduction 
of more than one decompositions based on different 
criteria, and the effect it has to the theoretical prop¬ 
erties of the ColdStart subcomponent. Notice, that 
in NCDREC this generalization can be achieved read¬ 
ily, through the introduction of new low-rank proxim¬ 
ity matrices, Di, Wi, D 2 , W 2 ,... and associated pa¬ 
rameters, with no effect on the dimensionality of the 
model. 

In this work, we considered the single decomposi¬ 
tion case. Our experiments on the MovieLens and 
the Yahoo! R2Music datasets, indicate that NC¬ 
DREC outperforms several - known for their anti¬ 
sparsity properties - state-of-the-art graph-based algo¬ 
rithms in widely used performance metrics, being at 
the same time by far the most economical one. Note 
here that the random-walk approaches, EP and CT, 
require to handle a graph of {n + m + K) nodes 
and to compute 2nm first passage time scores. Simi¬ 
larly, Lf, Katz and MEA, involve the inversions of an 
{n + m + AT)-dimensional square matrix. In fact, only 
NCDREC involves matrices whose dimensions de¬ 
pend solely on the cardinality of the itemspace, which 
in most realistic applications increases slowly. 

In conclusion, our findings suggest that NCDREC 
carries the potential of handling sparsity effectively, 
and produce high quality results in standard, long-tail 
as well as cold-start recommendation scenarios. 
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Appendix 

A. Theoretical Discussion of NCDREC’s Main 
Component 

Let us consider the singular value decomposition of 
matrix G, 

G = USVT 


(39) 


Multiplying from the right with V and using the fact 
that its columns denote an orthonormal set of vectors 
we get 

GV = US (40) 


Thus the recommendation vectors produced by the 
main component of NCDREC can be seen as arising 
from a low dimensional eigenspace of the NCD - per¬ 
turbed inter-item similarity matrix, as seen in Eq. (l44l i. 


Multiplying from the right with the diagonal matrix 
Diag(l^^^^^^, 0,..., 0) gives 

/ 


G [Vf 0] = U 


Sf 0 
0 0 


(41) 


and finally, discarding the zero columns we get 


GVf = UfSf (42) 

Now plugging this in Eq. ([8]) we see that the recom¬ 
mendation vector for the user Ui, ttJ is given by; 

= gJ.VfVfT (43) 

Notice that Vf contains the orthogonal set of eigen¬ 
vectors of the m X m symmetric positive semidefinite 
matrix 

GTG = (R -f eW)T(R -t- eW) 

= (RT -f eWT)(R + eW) 

= RTR e(RTW 4- W^R) -p e^W^W 

(44) 


B. Proof of Lemma 4 

Lemma 4. If A is the transition matrix of an irre¬ 
ducible and aperiodic Markov chain with finite state 
space, and B the transition matrix of any Markov 
chain defined onto the same state space, then matrix 
C = kA+AB, where k, A > 0 such that k-|-A = 1 de¬ 
notes the transition matrix of an irreducible and ape¬ 
riodic Markov chain also. 

Proof It is easy to see that for k, A > 0 such that 
K-f A = 1 matrix C is also a valid transition probability 
matrix. Furthermore, when A is irreducible there ex¬ 
ists a positive probability path between any two given 
states of the corresponding Markov chain. The same 
path will also be valid for the Markov chain that cor¬ 
responds to matrix C, as long as k > 0. The same 
thing is true for the aperiodicity property, since the ad¬ 
dition of the stochastic matrix B does nothing to the 
length of the possible paths that allow a return to any 
given state of the Markov chain that corresponds to 
matrix A. Thus, the irreducibility and the aperiodicity 
of A, together with the requirement k > 0, imply the 
existence of those properties to the final matrix C, as 
needed. □ 




